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NONPARAMETRIC SPECTRAL ANALYSIS WITH APPLICATIONS 
TO SEIZURE CHARACTERIZATION USING EEG TIME SERIES 
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of California, Santa Barbara 

Understanding the seizure initiation process and its propagation 
pattern(s) is a critical task in epilepsy research. Characteristics of the 
pre-seizure electroencephalograms (EEGs) such as oscillating pow- 
ers and high-frequency activities are believed to be indicative of the 
seizure onset and spread patterns. In this article, we analyze epilep- 
tic EEG time series using nonparametric spectral estimation meth- 
ods to extract information on seizure-specific power and character- 
istic frequency [or frequency band(s)]. Because the EEGs may be- 
come nonstationary before seizure events, we develop methods for 
both stationary and local stationary processes. Based on penalized 
Whittle likelihood, we propose a direct generalized maximum likeli- 
hood (GML) and generalized approximate cross-validation (GACV) 
methods to estimate smoothing parameters in both smoothing spline 
spectrum estimation of a stationary process and smoothing spline 
ANOVA time-varying spectrum estimation of a locally stationary 
process. We also propose permutation methods to test if a locally 
stationary process is stationary. Extensive simulations indicate that 
the proposed direct methods, especially the direct GML, are stable 
and perform better than other existing methods. We apply the pro- 
posed methods to the intracranial electroencephalograms (IEEGs) of 
an epileptic patient to gain insights into the seizure generation pro- 
cess. 



1. Introduction. Roughly 1% of the population in developed nations suf- 
fers from epilepsy. Of these about 30% have medically refractory epilepsy, 
where the most devastating feature is seizure. The only hope for relieving 



Received May 2008; revised May 2008. 

1 Supported by the Career Development Funds from the Fred Hutchinson Cancer Re- 
search Center. 

2 Supported in part by an NSF Grant. 

Key words and phrases. EEG, epilepsy, GACV, GML, locally stationary process, per- 
mutation test, smoothing parameter, smoothing spline, SS ANOVA. 

This is an electronic reprint of the original article published by the 

Institute of Mathematical Statistics in The Annals of Applied Statistics. 

2008, Vol. 2, No. 4, 1432-1451. This reprint differs from the original in pagination 

and typographic detail. 



1 



2 



L. QIN AND Y. WANG 



these patients from the disabling seizures is resective surgery, while the surgi- 
cal success rate varies between less than 25% to 70% depending on how well 
the seizure initiation zone could be removed or the seizure propagation path 
could be disconnected [Schiller, Cascino and Sharbrough (1998)]. Therefore, 
a better understanding of the seizure initiation process and its propagation 
patterns is crucial to the success of a surgery. Epileptic patients usually 
undergo presurgical evaluations where they are monitored over time with 
EEGs recorded by electrodes placed at different locations of the brain called 
channels. The locations of the EEG channels are believed to give the best 
coverage of the epileptogenic zone, determined by the neurosurgeons. While 
EEGs from 128 or 256 channels are recorded, only a few of them cover the 
origin or critical path of the seizure generation. Therefore, the identification 
of the seizure focus and its spread patterns would improve clinical judgement 
on where to resect that would render the patients seizure free. Character- 
istics of the pre-seizure ("pre-ictal") EEGs including oscillating powers and 
high-frequency activities are believed to be indicative of the seizure onset 
and spread patterns. They behave differently from those characteristics of 
the baseline ( "inter-ictal" ) EEGs [Winterhalder et al. (2003)]. 

Spectral analysis of EEG time series plays a central role in epilepsy 
research. Due to the large heterogeneity in pathology between patients, 
seizure-specific characterization has to be initially performed within each 
subject. Identifying seizure-specific power changes and seizure-characteristic 
frequency band(s) is a key step of this endeavor and the main focus of this 
paper. Methods proposed in this paper provide a systematic tool for this 
key step. As shown in the simulations, the proposed direct GML and GACV 
methods are more robust and efficient than existing methods. 

Spectral analysis is an important field in time series analysis [Brillinger 
(1981), Shumway (2000)]. The spectrum is often used to describe the power 
distribution and stochastic variation of a time series. For stationary time 
series, harmonic trends and power evolution can be explained by the spec- 
trum at different frequencies. It is well known that stationarity does not 
always hold for EEG time series [Lopes da Silva (1978)]. Locally station- 
ary processes have been proposed to approximate the nonstationary time 
series [Dahlhaus (1997), Ombao et al. (2002), Guo et al. (2003)]. The time- 
varying spectrum of a locally stationary time series characterizes changes of 
the stochastic variation over time that may reflect important patterns of bi- 
ological activity. In particular, our spectral analysis of the EEGs can be used 
to aid the clinical judgement in two ways: first, due to the one-to-one corre- 
spondence between the spectrum and the variance of the EEG time series, 
changes in brain power before, during and after a seizure can be character- 
ized by the estimated time-varying spectra of the pre-ictal and inter-ictal 
EEGs; second and more importantly, from a signal processing point of view, 
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the phase information between two or more channels reveals whether the sig- 
nals are synchronized, and at what frequencies (or frequency bands) if they 
are. This information provides an important marker of seizure initiation and 
localization [Mormann et al. (2003)]. By comparing the pre-seizure time- 
varying spectrum with that of the baseline (seizure-free) segments within a 
channel, the channel-specific seizure-characteristic frequencies can be identi- 
fied (if such characteristics exist). If the seizure-specific frequency band(s) in 
one channel overlaps with that in another channel, then there will likely be 
the signal coupling between the two locations. Resecting either of the loca- 
tion would help to reduce surgical failure. However if the frequency band(s) 
do not overlap, then there may be across-frequency interactions that asso- 
ciated with the seizure propagation patterns [Mormann et al. (2000)]. 

As in most epileptic EEG studies, we analyze the intracranial EEGs 
(IEEGs) as the recordings have less artifacts. Figure 1 shows 5-minute 
IEEG segments from a patient with medicine-resistant mesial temporal lobe 
epilepsy right before a seizure's clinical onset (upper panel) and at base- 
line (lower panel) extracted at least four hours before the seizure's onset. 
The data were collected by the EEG Lab of University of Pennsylvania 
[D'Alessandro et al. (2001)]. It is visually unclear what the seizure-specific 
frequency band and its time-varying patterns are for both channels. We will 
describe analyses of this data in Section 5. 

It is well known that the periodograms is an unbiased, but not consis- 
tent, estimator of the spectrum. Therefore, periodogram smoothing is a 
popular tool for nonparametric spectra estimation. Smoothing techniques in- 
cluding kernel smoother [Lee (1997), Ombao et al. (2001), Hannig and Lee 
(2004)], smoothing spline [Wahba (1980), Pawitan and O'Sullivan (1994), 
Guo et al. (2003)], regression spline [Kooperberg, Stone and Truong (1995)], 
local polynomial [Fan and Kreutzberger (1998)] and wavelet [Gao (1997)] 
have been applied to obtain consistent estimators. One may smooth peri- 
odograms directly [Lee (1997), Ombao et al. (2001), Hannig and Lee (2004)], 
smooth logarithmic periodograms [Wahba (1980), Guo et al. (2003)], or use 
Whittle likelihood [Pawitan and O'Sullivan (1994), Guo et al. (2003)]. In 
this paper, we consider smoothing spline estimation based on the Whittle 
likelihood. 

It is well-recognized that the choice of smoothing parameters is crucial to 
the performance of the smoothing methods. Many methods have been devel- 
oped for kernel smoother [Lee (1997), Ombao et al. (2001), Hannig and Lee 
(2004)] and smoothing spline [Wahba (1980), Pawitan and O'Sullivan (1994)]. 
Generalized cross-validation (GCV), generalized maximum likelihood (GML) 
and unbiased risk (UBR) criteria can be used to select the smoothing param- 
eter when fitting smoothing spline models on the logarithms scale [Wahba 
(1990)]. The logarithmic transformation is the first order approximation 
which is less efficient than the Whittle likelihood [Pawitan and O'Sullivan 
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Baseline lEEGs: Channel 1 Baseline lEEGs. Channel 2 




Time (Second) Time (Second) 



Fig. 1. IEEG segments from two channels of an epileptic patient. The upper panels show 
5 minute preseizure segments with the seizure's onset at the 5th minute. The lower panels 
show 5 minute baseline segments collected hours away from the seizure. The sampling rate 
is 200 Hertz. The total number of time points is 60, 000 for each segment. 



(1994), Fan and Kreutzberger (1998), Guo et al. (2003)]. For fitting smooth- 
ing splines using penalized Whittle likelihood, Pawitan and O'Sullivan (1994) 
developed a criterion based on an estimate of the risk function for the selec- 
tion of the smoothing parameter. Guo et al. (2003) used an indirect GML 
method to select the smoothing parameter. The indirect approach does not 
guarantee convergence and may have inferior performance than direct meth- 
ods (Section 4). 

We develop new direct data-driven methods for selecting smoothing pa- 
rameters in the estimation of a spectral density and tests for the hypothesis 
that a locally stationary process is stationary. The rest of the paper is struc- 
tured as follows. We present direct GML and GACV methods for stationary 
processes in Section 2. We develop direct GML method, GACV method and 
stationarity tests for locally stationary processes in Section 3. The simula- 
tion study is summarized in Section 4. The analysis of IEEG time series is 
presented in Section 5. We conclude with some remarks in Section 6. 
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2. Stationary processes. 

2.1. Notation. Let Xt, t = 0, ±1, ±2, . . . , be a stationary time series with 
mean zero and covariance function j(u) = ~E>{XtXt+ u ). The second-order 
properties of Xt are equivalently described by the spectrum 

oo 

(1) /M= l(u)exp(-i27ruju), w€ [0,1], 

u=— oo 

where the imaginary unit i 2 = —1. 

Let Xq,X\, . . . ,Xt-i be a finite sample of the stationary process and 

2 



(2) y k = T- 1 



T-l 



^ X t exp(i2Trkt/T) 



t=o 



k = 0,...,T-l, 



be the periodogram at frequency u>k = k/T. Our goal is to estimate / based 
on observations in the form {{uk,yk),k = 0, . . . ,T — 1}. Under standard mix- 
ing conditions [Brillinger (1981)] 

(3) Vk~ f(u>k)xl,d k /dk, 

where xt. 

dk are independent Chi-square random variables with degree of 
freedom = 1 for k = and k = T/2 (if T is even) and dk = 2 for 1 < 
k<T/2- 1. As [Pawitan and O'Sullivan (1994) and Guo et al. (2003)], for 
simplicity, the slight difference in degrees of freedom will be ignored and all 
degrees of freedom are set to two. 

2.2. Smoothing spline estimation. We model the logarithm of the spec- 
trum g = log(/) using a periodic spline space [Wahba (1990)] 

W^(per) = < g:g and g are abs. cont., 

(4) 

5 (0) = </(!), </(0) = </(!), / G?») 2 du;<oo 



o 



W2(per) = {1} © W2 (per), where {1} is the one-dimensional space consisting 
of all constant functions, and W® (P er ) i s a reproducing kernel Hilbert space 
with reproducing kernel R\{uji,uj2) = — B±(\u\ — oj2\)/2&, is the frac- 

tional part of uji - uj 2 and B 4 (u) = (uj - 0.5) 4 - (u - 0.5) 2 /2 + 7/240 [Wahba 
(1990), Gu (2002)]. We note that [Wahba (1980) and Pawitan and O'Sullivan 
(1994)] essentially used the same model space with solutions approximated 
using cepstral coefficients. 

Because g(uj) is symmetric about uj = 0.5, it suffices to estimate g{ui) for 
to € [0,0.5]. As in Wahba (1980) and Pawitan and O'Sullivan (1994), we use 
all y^s even though y^ = yx-k- This is to allow periodic smoothing. Let 
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g k = g(oJk)- Based on (3), we estimate g as the minimizer of the following 
penalized Whittle likelihood [Gu (2002)] 

T-1 ! 

(5) y2{g k + y k exp(-g k )} + TX / {g"(uj)} 2 dw, 

where A is a smoothing parameter balancing the goodness-of-fit and the 
smoothness of the function g. 

For a fixed A, the solution to (5) can be represented as [Wahba et al. 
(1995), Gu (2002)] 

T-1 

(6) g(u) = d + ^2 c k Ri{uJ k ,uj). 

k=0 

We compute coefficients d and c k s iteratively using the iterative reweighted 
penalized least squares (IRPLS) method [McCullagh and Nelder (1989), 
Wahba et al. (1995)]. Specifically, at each iteration, we compute the working 
variable z k = g k + y k ex.p(—g k ) — 1 and weight w k = 1 where tilde indicates 
current estimates. Note that, as [Pawitan and O'Sullivan (1994)], we use the 
Fisher scoring method instead of the Newton-Raphson method employed in 
Wahba et al. (1995), Gu (2002) and Liu, Tong and Wang (2006). Our ex- 
perience indicates that the Fisher scoring method is more stable in this 
situation. One may select A using the GCV, GML or UBR criterion at each 
iteration of the above IRPLS procedure [Wahba et al. (1995)]. However, such 
an indirect approach may lead to nonconvergence of the algorithm (Section 
4). 

2.3. Direct GML and GACV methods. We simply introduce the direct 
GML and GACV criteria in this section. Derivations can be found in sup- 
plementary materials. 

Consider the following prior for g: 

(7) G(uj) =a + (TA)~ (1/2) VF(w), 

where a~A r (0,a), W(u>) is a Gaussian process independent of a with 
E{W{uj)} = and EiWiuj^W^)} = Ri(ui,u 2 ). [Gu (1992)] showed that, 
as a — > oo, the posterior distribution of G(u) can be approximated by a 
Gaussian distribution with posterior mean equals the spline estimate g. 
This connection between smoothing splines and Bayesian models has been 
exploited to develop methods for selecting smoothing parameters for esti- 
mating variance functions [Liu, Tong and Wang (2006)] and constructing 
confidence intervals [Gu (1992), Wahba et al. (1995)]. We use this connec- 
tion to develop a direct GML criterion for selecting A. 

Let y = (yo,...,y T -iY, g= (go, ■ ■ ■ ,9t-i)', S = (1, . . . , 1) be a vector of 
size T, S = {Ri(uJi,u>j)}Jj =1 , (Q\Q2){R' , 0')' be the QR decomposition of S, 
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and U DU' be the spectral decomposition of Q^Q2 where D = diag(5i, . . . , 
6 T -i) and 8 V are eigenvalues. Let g k = g(u k ), u k = 1 - y fc exp(-# fc ), g = 
(go, . . .,pr_i)', u c = (u , . . . ,itr-i)', y c = g - u c and z = (zi,.. .,zr-i)' = 
U'Q' 2 yc- Ignoring some constants, the negative log marginal likelihood of y 
can be approximated by 

T-i 1 
GML(A) = {9k + Vk exp(-ff fc )} - -u c 'u c 

. , k=0 
8 



2 sai 

The GML criterion (8) is new to the estimation of the spectrum. We 
minimize (8) to find an estimate of A which is referred to as the direct GML 
estimate. 

As in Lin et al. (2000), consider the comparative Kullback-Leibler crite- 
rion 

2 T ~ l 

(9) CKL(g,g) = -Y^{eM9i-9i)+9i}- 

i=0 

The CKL criterion cannot be used directly to select the smoothing param- 
eters since it depends on the true log-spectrum. We need a proxy for this 
criterion. Let g^~ 1 ^ be the estimate of g without the ith observation j/j, that 
is, <p is the minimizer of the penalized Whittle likelihood 

y2{9k + Vkexp(-g k )} +TX [ {g" (uj)} 2 du; . 

Let g k =g^~ i \u k ). The leaving-out-one cross-validation estimate of the 
CKL criterion is 

T— 1 

(10) CV(A) = - J2imeM-t l) ) +9i}- 

i=0 

The function CV(A) may be used to select the smoothing parameter. 
However, the computation is usually expensive. An approximation of CV(A) 
is 

T-l 

GACV(A) = {Vi exp(-&) + 9i} 

(11) 

tiH T ~ l 

H m m ^ e ^p(-9i){yi - exp(^)}, 

T-trW 1/2 HW 1/2 S 



<s 
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where H = (W + nXQ) 1 V, W = diag(yi exp(-&)/2, . . . ,y n exp(-g n )/2), 
^ = Q2{Q'2^Q2) + Q'21 + represents the Moore-Penrose generalized inverse, 
V = diag(exp(-pi)/2, . . . , exp(-g n )/2) and W = diag(exp(gi), . . . , exp(g n )). 
Equation (11) is referred to as the GACV criterion and a GACV estimate 
of A is the minimizer of this criterion. 

3. Locally stationary processes. 

3.1. Local periodograms. Let Xt,t = 0, . . . ,T — 1, be a finite sample of 
the following mean zero locally stationary process [Guo et al. (2003)]: 

(12) X t = I A(tJ,t/T)exp(i2iru;t)dZ(u), 

Jo 

where Z(uS) is a zero-mean stochastic process on [0, 1] and A(oj,u) denotes 
a transfer function with continuous second order derivatives for (uj,u) € 
[0,1] x [0,1]. 

The time-dependent spectrum f(uj,u) = \\A(u;,u)\\ 2 is assumed to be a 
smooth function in both u and u. For estimation, local periodograms are 
calculated on some pre-defined time- frequency grids. Specifically, the time 
domain is partitioned into J disjoint blocks [bj,bj + i) where = b\ < b<i < 
• • • < bj < bj+i = 1. Let Uj = (bj + bj + \)/2 and cok's be K frequencies in 
[0, 1]. Then the local periodograms are calculated as 

\b j+1 -bj\ 

k = l,...,K, j = l,...,J. 

Guo et al. (2003) suggested that the block size should be in the order of 
T 1 / 2 and showed that the estimation is not very sensitive to the choices of 
sizes for the time and frequency grids. We investigate the impact of these 
choices in our simulations. 

3.2. SS ANOVA estimation. As in Guo et al. (2003), we model loga- 
rithm of the time-dependent spectrum g(w,u) =log/(u;,u) using the SS 
ANOVA model 

(14) g{u>,u) = Pi+ fciu - 0.5) + si(o/) + s 2 (u) + s 3 (uj,u) + s 4 (w,ti), 

where ^(u — 0.5) and S2(u) are linear and smooth main effects of time, s\(u>) 
is the smooth main effect of frequency, and s^(uj, u) and 54(0;, u) are linear- 
smooth and smooth-smooth interactions between frequency and time. Let 
7 = (co,u) and F = (7j)™ =1 be the selected time-frequency grid with corre- 
sponding log-local periodograms y = (yi, ...,y n ) = (log(Jii), . . . , log^j)) 



(13) 



I(u k ,Uj 
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where n = KJ. We estimate g as the minimizer of the penalized Whittle 
likelihood 

n 4 

(15) ^{gi + yiex.p(-gi)} + nJ2 X r \\P r g\\ 2 , 

i=l r=l 

where gi = g(~fi), A r 's are smoothing parameters and P r is the projection 
operator onto the subspace corresponding to s r , r = 1, . . . ,4. Let A r = X/9 r . 
The solution to (15) is [Gu (2002)] 

n 4 

(16) g(rf) = di + d 2 (u - 0.5) + J2 tH J2 OrRr (Ji , 7) , 

i=l r=l 

where Ri(ui,U2) was defined in Section 2, R 2 (ui,u 2 ) = B%(u\)B2(u2)/4: — 
B a ([uji - w 2 ])/24, R 3 ((u 1 ,u 1 ),(u 2 ,u 2 )) = Ri(uj\,u)2){ui - 0.5)(it 2 - 0.5), 
i? 4 ((o;i,ui), (^2,^2)) = -Ri(wi,W2)-R2(ni,M 2 ), and -B 2 (u) = (it — 0.5) 2 - 1/12. 
Again, for fixed A r 's, coefficients d\, d 2 and Cj's can be computed using the 
IRPLS procedure. Guo et al. (2003) selected A r 's at each iteration using the 
GML method. Again, the indirect approach may lead to nonconvergence 
and inferior performance (Section 4). 

3.3. Direct GML and GACV methods. We simply introduce the direct 
GML and GACV criteria in this section. Derivations can be found in sup- 
plementary materials. 

Consider the following prior for g: 

4 

(17) G{i) = 01 + a 2 (u - 0.5) + (nA)- (1/2) £ 6y 2 W r (j), 

r=l 

where a = (a±, a 2 )' ~ N(0, al), W r (7)'s are Gaussian processes independent 
of a with E{W r (j)} = and E{W r (-f)W r (C)} = R T {l, 0- 

Let g=(gi,...,g n )', S = {l, Ul -0.5}f =1 , ^e = {Et=iOrRrh l ,l j m^ =l , 
(QiQ2)(R' , O'Y be the QR decomposition of S, and UDU' be the spectral 
decomposition of Q' 2 T,qQ 2 where D = diag(#i, . . . ,S n - 2 ) and 5 U are eigenval- 
ues. Let g i =g(-y i ), Ui = 1 - y { exp(-ffi), g = (<?i, ... ,<?„)', u c = (m, ... ,«„)', 
y c = g — u c , and z = (zi, . . . , z n _2)' = U'Q' 2 y c . Ignoring some constants, an 
approximation to the negative log marginal likelihood of y is 



GML(A,0) = ^2{gi + yiexp(-gi)} - -u c 'u c 
i=i A 
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where = (6\, . . . ,64)' . Note that 8 u 's and z„s depend on 0. The direct 
GML estimates of the smoothing parameters (A, 6) are the minimizers of 
(18). 

The comparative Kullback-Leibler criterion and its cross-validation esti- 
mate are defined similarly as those in Section 2.3. Then an approximation 
to the cross-validation estimate of the CKL criterion is 

n 

GACV(A,0) = £{»iexp(-&) +9i} 

i=l 

trH 

H 1J2 1J2 Hyi ex P(-9i){Vi -exp(gi)}, 

n - tr W 1 HW ' i= i 

where H = (W + nXty^V, W = diag(yi exp(-&)/2, . . . , y n exp(-g n )/2), 
= Q 2 (Q / 2 S Q 2 )+Q^ V = diag(exp(-5i)/2,...,exp(-5 n )/2) and ^ = 
diag(exp(<7i), . . . ,exp(<? n )). The GACV estimates of the smoothing parame- 
ters (A, 6) are the minimizer of (19). 



3.4. Permutation tests for stationarity. It is often of interest to test if a 
locally stationary process is stationary: 

Hq : h(u) = for all u vs. H\ : h(u) 7^ for some u, 

where h(u) = ^{u — 0.5) + s 2 (^) + ss(uj, u) + Si(u, u). The model under Hi 
is the full SS ANOVA model (14). Denote the model under Hq, g(u,u) = 
Pi + si(cu), as the reduced model. Let g F and g R be estimates of g under the 
full and reduced models, respectively. Let Dp = J2?=i{df + Ui ex -P(~9f) ~ 
log yi - 1} and D R = £™ = i{gf + m exp(-gf) - logy* - 1} be deviances under 
the full and reduced models. We construct two statistics: 

Si = D R -D F , 

S 2 = f [ {g F (uJ,u) - g R (uj,u)} 2 dujdu, 
Jo Jo 

where Si corresponds to the Chi-square statistics commonly used for gener- 
alized linear models, and S2 computes the L2 distance between g F and g R . 
We reject Hq when these statistics are large. The null distributions of these 
statistics are unknown. Under Hq, g does not depend on u. Therefore, we 
can use permutation to compute null distributions. Specifically, we generate 
permutation samples by shuffling time grid, compute two statistics for each 
permutation sample, and compute p-values as the proportion of statistics 
based on permutation samples which are larger than those based on the 
original data. Small scale simulations in the supplemental materials indicate 
that the permutation tests perform well. 
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4. Simulations. 

4.1. Simulations for stationary processes. We first conduct simulations 
to evaluate the performance of the direct GML and GACV methods for 
spectral density estimation of stationary processes. We simulate data from 
two processes used in Wahba (1980) and Pawitan and O'Sullivan (1994): 

1. AR(3): X t = 1.4256AVi - 0.7344AV 2 + 0.1296AV 3 + e t , 

2. MA(4): X t = e t - 0.3e t -i - 0.6e t . 2 - 0.3e t . 3 + 0.6et_ 4 , 

where et 1 ~ ' N(0, 1). We consider two sample sizes, T = 128 and T = 256, 
for each process. For each setting, we repeat simulation 1000 times. 

To compare with the method in Wahba (1980), we fit cubic periodic 
splines to the logarithmic transformed periodograms z k = \og(y k ) + c k where 
c fc = 0.57721 for k ^ and k + T/2 and c k = 0.30135 for k = or jfe = T/2. 
We use the GML method to select the smoothing parameter on the log- 
arithmic scale. Other methods such as GCV have also been tested and 
the comparative results remain the same. To compare with the method in 
Pawitan and O'Sullivan (1994), we fit a cubic periodic spline model using 
the penalized likelihood (5) with the smoothing parameter A selected as the 
minimizer of 

T-l 

(20) RE(X) = J2(9k-v k ) 2 + 2tr{H(X)}, 

k=0 

where v k = g k + yk ex p(~9k) ~ 1 an d H(X) is the smoother matrix at con- 
vergence of the IRPLS procedure with the fixed A. The criterion (20) is a 
direct adaption of (12) in Pawitan and O'Sullivan (1994). v^s need to be 
calculated with A close to the optimal choice. Pawitan and O'Sullivan (1994) 
suggested a two-step procedure: first choose A that minimizes RE(X) using 
v k = 9k + Vk exp(— gk) — 1 and denote the resulting estimates as g^p, and then 
choose A that minimizes RE(X) using = g^ v + Uk^p(~9kp) — 1- We use 
the same two-step procedure with a uniform (on a logarithmic scale) grid 
search over the range e" ~ 25 < A < e _1 . As Pawitan and O'Sullivan (1994), 5 
values are used in the first step and 50 values are used in the second step. 
For each simulation replication, we compute the mean squared error as 

1 

MSE m = — ^2,{g k - g k ) 2 , 
k=0 

where m represents the method employed to estimate g.m = LS, IM, DM, DV 
and PO, respectively, represent the method in Wahba (1980), the indirect 
GML method, the direct GML method, the GACV method and the method 
in Pawitan and O'Sullivan (1994). The indirect GML method selects A at 
each iteration of the IRPLS procedure using the GML criterion. 
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Fig. 2. Boxplots of MSEs on logarithm scale for the A R3 process. 

For the indirect GML method, several replications failed to converge (Ta- 
ble 1). The GACV method failed to converge in one replication. None of 
the other methods has this problem. The boxplots of MSEs are shown in 
Figures 2 and 3. As Pawitan and O'Sullivan (1994), we compute the rela- 
tive efficiencies MSE m /MSEDM for m = LS, IM, DV and PO. The medians 
of these relative efficiencies are listed in Table 1. From Figures 2 and 3, 
the MSEs based on the LS, IM, DV and PO methods have heavier tails 
than those from the DM method. Thus the mean relative efficiencies can 
be much bigger than medians (Table 1). We conclude that the direct GML 
is stable and has the best overall performance. Even though involving a 
somewhat ad hoc two-step procedure, the PO method performs well. One 
problem with the PO method is that it sometimes undersmooth the spec- 

Table 1 

Median (first row for each T) and mean (second row for each T) relative efficiencies 

(relative to the DM method) 



AR3 



MA4 



T 


LS 


IM 


DV 


PO 


LS 


IM 


DV 


PO 


128 


1.51 


1.22 (2) 


1.41 


1.08 


1.30 


10.37 (7) 


1.77 


1.03 




1.82 


20.06 


3.70 


1.45 


1.41 


10.83 


2.29 


1.12 


256 


1.49 


1.19 (4) 


1.19 


1.06 


1.33 


11.80 (11) 


1.09 (1) 


0.99 




1.83 


16.87 


3.13 


1.45 


1.43 


13.76 


1.44 


1.05 



Numbers in the parentheses are the number of replications out of 1000 simulation repli- 
cations that the indirect GML and GACV methods failed to converge. 
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trum. The LS method is less efficient. The indirect GML method sometimes 
fails to converge and performs very badly for the MA4 process. The GACV 
method occasionally fails to converge due to numerical problems. Its per- 
formance is inferior to the direct GML when it converges. This is especially 
true in the case when n is small where GACV has many large MSEs due 
to undersmoothing, a phenomenon has been previously observed for the 
GCV method [Wahba and Wang (1993)]. We have conducted simulations 
with other stationary processes and sample sizes. The comparative results 
remain the same. 

4.2. Simulations for locally stationary processes. To investigate the per- 
formance of the direct GML and GACV methods for locally stationary pro- 
cesses, and compare them with the LS and indirect GML methods, we sim- 
ulate locally stationary time series from two time-varying spectra: 

1. LSI: g(u),u) = 4 + sin(2vru) + log{1.25 - cos(2vrw)}. 

2. LS2: g(cj,u) = 5 - 8(w - 0.5) 2 + sin{27rexp(u)} + 0.01(w - 0.5) 2 x 
sin{27r exp(u)}. 

Similarly to [Guo and Dai (2006)], the time series Xt,t = 0, . . . ,T — 1, are 
simulated based on the following relationship: 

T-l 

(21) X t = exp{g(k/T,t/T)}exp(27Tkti/T)Z k , 

k=0 

where = Z(k/T) are mutually independent random variables distributed 
as complex Normal with mean zero and variance 1/T. Z^ = Zx-k when 
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Fig. 3. Boxplots of MSEs on logarithm scale for the MA4 process. 
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Fig. 4. Boxplots of MSEs on logarithm scale for the LSI process. Numbers in the x-axis 
labels resent three settings of time-frequency grids: 1 for (64, 16), 2 for (32,32) and 3 for 
(16,64). 

k/T ^ 0,0.5, or 1. When k/T = 0,0.5, or 1, Z k = Z(k/T) are mutually 
independent random variables distributed as real Normal with mean zero 
and variance 1/T. We consider two sample sizes, T = 1024 and T = 2048, 
for each locally stationary process. To assess the impact of block sizes, we 
consider three time-frequency grids: (K, J) = (64, 16), (K, J) = (32, 32) and 
(K,J) = (16,64). For each setting, we repeat simulation 100 times. 
We compute the mean squared error as 

K J 

MSE m = £ Y,(9kj ~ 9 kj ) 2 /(KJ), 
fe=ij=i 

where m = LS, IM, DM and DV which correspond to the method based on 
logarithmic transformation, the indirect GML method in [Guo et al. (2003)] 
and the direct GML method and the GACV method. 

The boxplots of MSEs are shown in Figures 4 and 5. Table 2 lists median 
relative efficiencies and the number of replications that the indirect GML 
and GACV methods failed to converge. The comparative results are similar 
to those in the stationary case: the indirect GML and GACV methods may 
fail to converge and the direct GML is stable and has the best performance. 
When converged, the GACV has comparable performance to the direct GML 
method. However, the GACV method takes much longer to compute. There- 
fore, the direct GML method is recommended. The estimation is not very 
sensitive to the choices of block sizes. We have conducted simulations with 
other sample sizes and block sizes. The comparative results remain the same. 
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Table 2 

Median relative efficiencies (relative to the DM method) 



T 


(K,J) 




LSI 






LS2 




LS 


IM 


DV 


LS 


IM 


DV 


1024 


(64,16) 


1.49 


1.14 (4) 


0.98 


1.49 


1.27(13) 


1.02 (1) 




(32,32) 


1.55 


1.10 (6) 


0.91 


1.38 


1.22 (13) 


1.06 (1) 




(16,64) 


1.47 


1.14 (11) 


0.99 


1.39 


1.21 (10) 


1.18 


2048 


(64, 16) 


1.48 


1.17 (6) 


1.05 


1.50 


1.33 (12) 


1.07(1) 




(32,32) 


1.51 


1.13 (5) 


1.12 (1) 


1.53 


1.28 (8) 


1.18(1) 




(16,64) 


1.56 


1.19 (3) 


0.95 (1) 


1.53 


1.16 (10) 


1.26(1) 



Numbers in parentheses are the number of replications out of 100 simulation replications 
that the indirect GML and GACV methods failed to converge. 



5. The IEEG analysis. Figure 1 shows the IEEGs of 5-minute preseizure 
and baseline segments from two channels. These two channels recorded 
IEEGs from adjacent electrodes in the mesial temporal lobe of the brain 
believed to be most relevant to the seizure. Therefore, we expect to see close 
connections in the power and frequency activities between both channels. 
The important clinical questions are: (1) does the characteristic frequencies 
change over time during the pre-ictal stage of the seizure as compared to 
the baseline in each channel? and (2) are such frequencies common across 
both channels? We answer these questions by analyzing the IEEGs using 
our estimation procedure. 
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Fig. 5. Boxplots of MSEs on logarithm scale for the LS2 process. Numbers in the x-axis 
labels resent three settings of time-frequency grids: 1 for (64, 16), 2 for (32,32) and 3 for 
(16,64). 
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Fig. 6. Time-varying spectra estimates (in log scale) for the preseizure segments (upper 
panels) and the baseline segments (lower panels) of the 5-minute IEEG segments from 
channel 1 (left) and channel 2 (right). 



We assume that the IEEG segments are locally stationary. To remove 
artifacts, we first normalize the raw IEEG recordings by subtracting their 
across-channel means. Each segment is then partitioned into 64 time blocks. 
Thirty-two equally-spaced frequency points are selected to calculate the local 
periodograms. Specifically, the time-frequency grids for the calculation are 
(w fe ,u j ) = (A;/33,(938 x j - 468.5)/60000) for k = l,.. . ,32, j = 1, ... ,63 and 
(a; fc ,u 64 ) = 0/33, 0.9925) for k = 1, . . . ,32. Figure 6 shows the SS ANOVA 
estimates of time-varying spectra for both the baseline and preseizure seg- 
ments of the two channels. The direct GML method is used for all fits in this 
section. Because the sampling rate is 200 HZ and the spectrum is symmetric 
around 100 HZ, we can only assess power changes in frequency bands ([0 
HZ-100 HZ]). It appears that the baseline spectrum does not vary much 
over time. 

For channel 1, the p- values for testing stationarity based on two statis- 
tics are 0.81 and 0.73 for the baseline segment and 0.01 and 0.01 for the 
preseizure segment. For channel 2, the p-values are 0.31 and 0.16 for the 
baseline segment and and 0.0067 for the preseizure segment. We conclude 
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that for this data set, the processes far away from the seizure's clinical onset 
can be regarded as stationary while the processes close to the seizure's clin- 
ical onset is nonstationary. As expected the pre-ictal spectra have similar 
shapes, indicating the possible connectivity between the power evolutions of 
the two adjacent channels. 

To find significant changes of the preseizure power spectra from those of 
the baseline segments, we compute 95% Bayesian confidence intervals for 
the estimated preseizure power spectra using the same method described 
in Wahba et al. (1995). We also compute the difference in estimated power 
spectra between preseizure and baseline IEEGs. At a particular point of 
time and frequency, the difference is deemed significant if the estimated 
baseline power spectrum is outside the 95% confidence interval. Figure 7 
shows the contour plots of estimated power spectra differences for time- 
frequency regions where the differences are significant. 

In channel 1, a high-frequency power discharge (at [75 HZ-100 HZ]) was 
recorded during approximately —300 to —280 seconds before the seizure; 
then a power build-up was captured for the next 85 seconds at [10 HZ-40 
HZ], followed by another significant power decrease at both high and low 
frequency ends ([75 HZ-100 HZ] and <5 HZ). However, the channel 2 IEEGs 
recorded significant power discharges as a broad band activity ([75 HZ-100 
HZ] and [20 HZ-40 HZ]) 5 minutes before the seizure. Interestingly, in this 
channel power increased around the same time as channel 1 (—270 to —150 
seconds), but at lower frequencies (<10 HZ). This phenomenon implies that 
the short term power build up may be regarded as a warning signal of the 
coming seizure. 

For both channels, the common frequency band for decreased power is 
within [75 HZ-100 HZ]. While there are no common frequencies at which 
the power increase occurred, the across-frequency interaction between the 
two channels appeared during —270 to —150 seconds. By further studying 
the within- frequency (at [75 HZ-100 HZ]) and across-frequency (between 
[10 HZ-40 HZ] and [0 HZ-10 HZ]) signal coupling, the seizure origination 
and propagation path may be inferred, which would eventually allow pre- 
ventive action or surgical resection to take place at the right location to 
prevent a future seizure. Our results conform with recent findings in the 
literature. Specifically, high-frequency oscillations are usual suspects of the 
electrical activities for the epileptic brain, which may be important markers 
for epileptic network function ([80 HZ-500 HZ]) [Worrell et al. (2004)], and 
7-band ([25 HZ-60 HZ]) activity may be associated with a reliable warning 
of the seizure [Aksenova, Volkovych and Villa (2007)]. 

6. Discussion. We have developed methods and software for smoothing 
spline and SS ANOVA spectrum estimation of stationary and locally station- 
ary processes. These methods are recommended since they are stable and 
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Fig. 7. Contour plots for the significant differences in power spectra between the pre- 
seizure IEEGs and baseline IEEGs (defined by subtracting the spectra estimates of the 
baseline segments from those of the preseizure segments). The positive differences (solid 
lines) and negative differences (dashed lines) indicate significant increases and significant 
decreases in power at 5% significance level. For both channels, the common frequency band 
for power discharge is within [75 HZ- 100 HZ], occurred as early as 5 minutes before its 
onset, while the simultaneous power build-up between — 270 and —150 seconds happened at 
different frequencies ([10 HZ-J^O HZ] for channel 1 and <10 HZ for channel 2). 



perform better than other existing methods. The risk estimation method 
by Pawitan and O'Sullivan (1994) may also be extended to the SS ANOVA 
spectrum estimation of locally stationary processes. It is unlikely that such 
an extension will perform better than the direct GML method. For a sta- 
tionary process, similar permutation tests can be constructed to test if its 
spectrum is a constant which amounts to a white noise series [Fan and Zhang 
(2004)]. 

Our analyses have successfully picked up the important time-varying be- 
haviors of the power and frequency components of the IEEG channels, 
which may be indicative of the seizure onset and propagation patterns 
[Worrell et al. (2004), Mormann et al. (2003), Mormann et al. (2000), Schiller, 
Cascino, Busacker and Sharbrough (1998)]. Future analyses would include 
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adopting the technique for multiple channels and multiple seizures. With 
the aid of clinical judgements from neurologists and neurosurgeons, seizure 
propagation patterns may eventually be uncovered. 

Even though we emphasize the application to the spectral analysis of 
EEG time series, our methods are general with a wide range of applications 
including neurological cognition studies. And they may be applied to under- 
stand the periodic secretion patterns of hormone time series and assess the 
spectral properties of the magnetoencephalography (MEG) time series. 

Acknowledgments. The authors thank the Editor, Associate Editor and 
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earlier draft. 

SUPPLEMENTARY MATERIAL 

Derivation of the direct GML criterion permutation test of stationarity: 
SOM for nonparametric spectral analysis (DOI: 10.1214/08-AOAS185SUPP; 
.pdf). The supplementary material [see Qin and Wang (2008)] contains 
derivation of the direct GML criterion, derivation of the GACV criterion, 
simulations for permutation tests of stationarity and R code for spectrum 
estimation. 
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